Training Poisson's Equation Encoder

Here is a demonstration on how to train the encoder to learn the parameters $\theta$ of a partial differential equation (PDE) given observations of the solution $u_\theta$. Now, technically for this procedure we don't actually need $u_\theta$ to be the solution of a PDE, it just needs to be some family of functions parameterized by $\theta$. But we will consider the PDE setting, as it is the desired usage of this autoencoder.

To that end, let $\theta = (c, b_0, b_1)$ be (constant) parameters for the Poisson equation with Dirichlet boundary conditions on the unit interval. That is, $u_\theta$ is the solution to

$$ \begin{align*} -\Delta u &= c \quad\text{on}\quad [0,1] \\ u(0) &= b_0 \\ u(1) &= b_1 \end{align*} $$

Since $c$ is constant, this equation is readily solved for $u_\theta = u(x)$ as the family of parabolas

$$ u(x) = -\frac{c}{2} x^2 + \left( \frac{c}{2} + b_1 - b_0 \right) x + b_0 $$

Now we will walk through training the encoder in five main steps:

  1. Write the configuration file
  2. Load the data
  3. Design the neural network
  4. Train the network
  5. Assess model fit

1 Poisson configuration file

To train the encoder to learn $\theta$, we simply create a Python script equations/poisson.py which contains the implementations of functions that define the domain, compute replicates of $\theta$, compute $u_\theta$ for a given replicate, and a list of parameter names (optional). There is a template configuration file in the equations directory. Let's briefly walk through the Poisson configuration file.

Parameter names

First we import numpy and create a list containing strings to name the parameters in $\theta$. In this case the names are $c$, $b_0$, and $b_1$. The strings may be formatted with matplotlib's TeX rendering syntax.

In [1]:
import numpy as np 

# PARAMETER NAMES
theta_names = [r'$c$', r'$b_0$', r'$b_1$']

Domain lattice

Next we write a quick function that computes a lattice over the domain, in this case the unit interval.

In [2]:
# DOMAIN
def domain():
    """
    The unit interval as an array of length 100
    """
    return np.linspace(0., 1., num=100)

Solution $u_\theta$

Now we write a function that takes $\theta$ as the input, and computes the solution function $u_\theta = u(x)$ over the lattice defined above. This should be written assuming the argument theta is a one-dimensional numpy array that contains the parameter values.

In [3]:
# SOLUTION
def solution(theta):
    """
    Computes the solution u where theta=[c, b0, b1]
    """
    x = domain()
    c = theta[0]
    b0 = theta[1]
    b1 = theta[2]
    u = -c * x**2 / 2 + (c/2 + b1 - b0)*x + b0
    return u

Priors on $\theta$

Now to simulate replicates of our parameters, we draw from uniform distributions and stack the arrays so that the output has shape (replicates, num_params).

In [4]:
# THETA
def theta(replicates):
    """
    Simulates replicates of Theta
    Output has shape (replicates, num_params)
    """
    # create replicates of theta=[c, b0, b1]
    c = np.random.uniform(-100,100, size=replicates)
    b0 = np.random.uniform(-10,10, size=replicates)
    b1 = np.random.uniform(-10,10, size=replicates)
    theta = np.stack((c, b0, b1), axis=1)
    return theta

With equations/poisson.py properly configured we need only run python train_encoder.py --trainon poisson. The file train_encoder.py has three main parts to it: load data, define neural network hyperparameters, train model. Instead of running this script, let's break it down here.

2 Load data

The modules loaded in train_encoder.py are in the Poisson package. We load (and possibly simulate) data using the Poisson.equation.Dataset class, which simply needs the name of the configuration file, excluding the file suffix.

In [5]:
from Poisson.equation import Dataset

# LOAD DATA
dataset = Dataset('poisson')
dataset.load()

The data are now loaded into the Dataset instance dataset, already split into a training, validation, and test set. It loads the data from the pickled file data/poisson.pkl, but if the file doesn't exist the data are simulated according to the configuration file using a default of 2000 replicates. The parameter names and functions defined in the configuration file are loaded and stored in the Dataset class. We can use this to explore the data before designing the network. Observe, we'll plot $u_\theta = u(x)$ for two realizations of $\theta$, with the parameter values in the titles.

In [6]:
dataset.plot_solution()

Indeed, we can confirm the left and right endpoints of the above curves are given by $b_0$ and $b_1$, respectively. Furthermore, the parameter $c$ measures the negative curvature of the solutions, as is evident from the Poisson equation.

3 Design the neural net

Henceforth, we let $\Phi$ denote observed data that acts as inputs to the neural net. The ground truth parameter is $\theta_\Phi$ while the parameters that are varying with the weights are $\theta$. Thus, $u_\theta$ represents a reconstruction of $\Phi$.

Now that the data are loaded, we can design a neural network to train. We will construct a feed-forward network with one hidden layer containing 20 nodes. By default the data are rescaled to be in the interval $[-1, 1]$, so $\tanh$ activations are a natural choice. Since this is a regression task, the final layer will be activated with the identity function $f(x) = x$. To avoid overfitting we'll drop 10% of the connections between the hidden and final layer. The optimizer and loss will be Adam and mean square error, respectively. Let's train the network for 10 epochs using batch sizes of 25.

Conveniently, this is all recorded in a design dictionary below. Most keys make intuitive sense, but the unit_activations key is perhaps more cryptic. It is a list of tuples, each tuple represents the number of units and activation for each layer (in this case 2 layers). We'll leave the callbacks list empty for now, but if we were training longer we might decide to use a learning rate schedule or Tensorboard callbacks.

In [7]:
out_units = dataset.train[1].shape[1]

## DEFINE MODEL PARAMETERS
design = {'unit_activations':[(20, 'linear'),
                              (out_units, 'linear')],
          'dropout':[0.],
          'optimizer':'adam',
          'loss':'mse',
          'callbacks':[],
          'batch_size':25,
          'epochs':25,
         }

4 Train!

All that is left is to train the encoder. We use the Encoder class from the module Poisson.encoder, which inherits from the Sequential class in Keras. For reproducible results we'll set a seed in Tensorflow before training.

In [8]:
from Poisson.encoder import Encoder
from tensorflow.random import set_seed
set_seed(23) # for reproducibility

# TRAIN MODEL
model = Encoder(design, dataset)
model.train()
Model: "model"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
Phi (InputLayer)             [(None, 100)]             0         
_________________________________________________________________
dense (Dense)                (None, 20)                2020      
_________________________________________________________________
theta (Dense)                (None, 3)                 63        
=================================================================
Total params: 2,083
Trainable params: 2,083
Non-trainable params: 0
_________________________________________________________________

5 Assessing model fit

Great, training complete! Now let's see how the network performs when predicting on a test set. We'll use the Encoder.plot_theta_fit() method to predict on the test set, print the component-wise MSE, and plot the predictions against the ground truth. Since $c$ and $b$ were simulated on different scales we should compare them on a common scale by dividing by the max absolute value of the predicted and true parameters.

5.1 Point estimates

Predicted $\theta$ on test set

In [9]:
model.plot_theta_fit()
theta MSE: [3.36925490e-08 2.10606274e-09 1.79584851e-09]
transformed theta MSE: [4.32112804e-12 2.78304833e-11 3.26881466e-11]

Not too shabby. Now let's see how the encoder performs when we hand it noisy data. That is, we test it on $\Phi + \varepsilon$ where $\varepsilon\sim\operatorname{Normal}(0,\sigma)$.

Predicted $\theta$ on noisy test set

In [10]:
model.plot_theta_fit(sigma=1.0)
theta MSE: [11.75830061  2.52288904  2.18767678]
transformed theta MSE: [0.00138415 0.0336355  0.02991483]

Since the purpose is to attach $\theta$ to a decoder that outputs $u_{\theta}$, an approximation of $\Phi$, we should totally decode the predicted parameters to produce the analytic solutions they give rise to. As expected, we can see the true and predicted solutions are quite close.

Predicted $u_\theta$ on test set

In [11]:
model.plot_solution_fit()
Latent theta MSE: [3.36925490e-08 2.10606274e-09 1.79584851e-09]
Reconstructed Phi MSE: 9.348119000467632e-10

Looks pretty good. Just as before, let's see the solution space fit when we hand the model $\Phi + \varepsilon$.

Predicted $u_\theta$ on noisy test set

In [12]:
model.plot_solution_fit(sigma=1.0)
Latent theta MSE: [11.75830061  2.52288904  2.18767678]
Reconstructed Phi MSE: 2.7145506565423383

5.2 Bootstrap uncertainty

Train on clean, test on noisy

In [13]:
num_boots = 100
model.bootstrap(num_boots, train_sigma=0, test_sigma=1.0)
Bootstrapping with 100 samples of size 720
done
In [14]:
model.plot_theta_boot()
In [15]:
model.plot_solution_boot()

Train on noisy, test on clean

In [16]:
model2 = Encoder(design, dataset)
model2.bootstrap(num_boots, train_sigma=1.0, test_sigma=0)
Bootstrapping with 100 samples of size 720
done
In [17]:
# bootstrap predictions on Phi + normal(0, sigma)
model2.plot_theta_boot()
In [18]:
# bootstrap predictions on Phi + normal(0, sigma)
model2.plot_solution_boot()

Train and test on noisy

In [19]:
model3 = Encoder(design, dataset)
model3.bootstrap(num_boots, train_sigma=1.0, test_sigma=1.0)
Bootstrapping with 100 samples of size 720
done
In [20]:
model3.plot_theta_boot()
In [21]:
model3.plot_solution_boot()

Compare models

Noisy train, clean test

In [22]:
model2.plot_solution_boot()

Clean train, noisy test

In [23]:
model.plot_solution_boot()

Noisy train, noisy test

In [24]:
model3.plot_solution_boot()